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ABSTRACT 


The evolution of artificial intelligence in several areas has allowed machines or techniques to accomplish 
any task with high accuracy, like detecting and classifying chest X-rays as cardiomegaly or healthy. The 
goal of this paper is to develop a deep learning technique to identify and classify chest X-rays, whether the 
images are health-related or cardiomegaly. Firstly, the chest X-ray dataset is used that called ChestX-ray8, 
which contains medical images about many diseases, including cardiomegaly. After that, we apply the 
preprocessing steps to the dataset, like making all images the same size and normalizing them. Before 
applying the deep learning techniques, it should use data augmentation methods, such as random rotation, 
random zoom, and random brightness. The deep learning technique used is the VGG16, which is a 
convolutional neural network model. The results show that the VGG16 model gives a high accuracy of 


91% compared with the previous works. 
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1. INTRODUCTION 


Chest radiography is commonly employed for the 
purpose of diagnosing various disorders affecting 
thoracic bones, the chest walls, and structures 
encompassed inside the thoracic cavity, such as the 
heart, lungs, and major blood veins. Chest 
radiography is a _ frequently used diagnostic 
modality for the identification of pneumonia and 
congestive heart failure [1]. Nevertheless, chest x- 
rays have found to be efficacious in the screening 
of certain chest disorders, despite their limitations 
in providing a definitive diagnosis. When there is a 
suspicion of a problem based on chest radiography, 
it might be necessary to carry out additional chest 
imaging in order to make a firm diagnosis or gather 
proof for the one the initial chest radiography 
suggested. A chest x-ray is not deemed necessary 
unless there is suspicion of a displaced, cracked rib 
that may potentially result in harm to the lungs and 
other tissue structures [1, 2]. 

A chest x-ray can find problems in the following 
areas: airways, breast shadows or bones, cardiac 
silhouette, costophrenic, diaphragm and extra [3, 
4]. Although chest radiography is a cost-effective 
and relatively low-risk approach for examining 


chest ailments, it is necessary to note that certain 
significant chest disorders can be present despite 
the appearance of a normal chest x-ray. As an 
illustration, it is possible for a patient diagnosed 
with acute myocardial infarction to exhibit a chest 
x-ray that appears entirely normal. Hence, it may 
be imperative to do further evaluation in order to 
establish a conclusive diagnosis [4]. 

Cardiomegaly is a health condition that the heart is 
enlargement, wherein its size exceeds 50% of the 
inner diameter of the rib cage [5]. Therefore, the 
timely cardiomegaly identification is a 
consequence of diagnosing associated symptoms. 
The cardiac dimension's evaluation using chest 
radiography continues to be a valuable diagnostic 
measure and significant. A chest x-ray makes it 
easy to find the cardiothoracic ratio (CTR), which 
can accurately identify heart enlargement and 
predict cardiomegaly with a 95% success rate. So, 
the early detection of this disease can help the 
medical system to reduce the number of infections 
cases or death rate based on Artificial Intelligence 
techniques [6]. 

Medical Technology is commonly used to 
encompass a several of instruments that empower 
healthcare practitioners to improve the well-being 
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of patients and society. These tools achieve this by 
facilitating early detection of ailments, minimizing 
complications, optimizing treatment approaches, 
offering less intrusive alternatives, and shortening 
hospital stays [7]. Prior to the advent of mobile 
technology, medical technologies primarily 
consisted of traditional medical devices such as 
prosthetics, stents, and implants. AI led to a huge 
revolution in the medical technologies field. It is 
subfield of computer science that specializes in 
addressing intricate issues, particularly in domains 
characterized by vast datasets and _ limited 
theoretical frameworks [7, 8]. For example, 
smartphones have become a popular tool for 
distributing and filling monitoring vital functions 
through biosensors, electronic personal health 
information, and promoting optimal therapeutic 
compliance. As a result, patients are empowered to 
take on a central role in their own care pathway [7]. 
The timely identification of cardiomegaly serves as 
a significant indicator for several cardiac 
conditions like cardiomyopathy, coronary artery 
disease, hypertension, infectious ailments, and 
renal disease. The radiographic assessment of 
cardiomegaly involves the utilization of the 
cardiothoracic ratio (CTR), which is a commonly 
employed metric that offers valuable prognostic 
insights [9]. Regrettably, the assessment of the 
cardiac-to-thoracic ratio (CTR) in chest x-ray 
(CXR) pictures is currently performed manually, 
resulting in a_ significant time requirement. 
Additionally, there exist illnesses that are linked to 
an increased cardio mediastinal silhouette, which 
might impede the process of making therapeutic 
decisions. The utilization of deep learning 
techniques, namely convolutional neural networks 
(ConvNets), can improve the effectiveness of 
analyzing extensive and _ intricate medical 
examinations. 

ConvNets employ raw picture pixel data as input 
and progressively extract abstract representations 
of the original image data, thereby facilitating the 
possibility of automating the assessment of 
coronary artery calcium scoring [9, 10]. The 
provision of a tool to aid radiologists in their 
interpretations would afford them the opportunity 
to allocate more time to patient interactions. 
Additionally, the availability of a tool allowing 
patients to seek a second opinion could potentially 
mitigate instances of misinterpretation and enhance 
the overall quality of healthcare delivery [11]. 

The remainder sections for this paper are as follow: 
Section 2 describes the previous papers that related 
to cardiomegaly detection using different 
algorithms. Section 3 presents the proposed 


methodology used in terms of datasets, data 
preprocessing, feature extraction, and deep learning 
models. Section 4 illustrates the experimental 
results and discusses them. Finally, in section 5, the 
conclusion of the paper and suggest some future 
work. 


2. LITERATURE REVIEW 


Table 2 summarizes the previous papers that 
applied different deep learning algorithms to the 
cardiomegaly dataset to classify and detect the 
cardiomegaly disease. 

Chamveha et al. [12] put forth a 

computational method for determining — the 
cardiothoracic ratio (CTR) based on radiographic 
images of the chest. They employed a U-Net 
architecture with a VGGI16. This model was 
employed to extract heart and lung masks based on 
images of the chest X-ray. The dataset used 
contains 245 images labelled with heart and lung 
masks from JSRT dataset. Images of the chest X- 
ray within the dataset were collected using various 
equipment from diverse hospitals. Consequently, 
there exists variation in the image intensity, 
necessitating the normalization of these images 
prior to their utilizations in a deep learning model. 
They employed the technique of histogram 
equalization in order to standardize the 
photographs. Subsequently, the extent of these 
masks was used to determine the CTR. The CTR 
measurements were assessed by human 
radiologists, and it was determined that 76.5% of 
them were deemed suitable for inclusion in medical 
reports without requiring any modifications. The 
outcome of this study indicated a significant 
reduction in time and labour for radiologists who 
utilize their automated solutions. 
To efficiently diagnose and localize cardiomegaly, 
Innat et al [13] introduced a deep learning model, 
which is called Cardio-XAttentionNet. To create a 
lightweight and efficient Attention Mapping 
Mechanism, they reexamined 

the global average pooling system and 
incorporate a weighting term. The model allowed 
pixel-level localization only from two _ levels: 
image labeling and image classification for the 
cardiomegaly categorization based on chest X-rays. 
To create Cardio-XAttentionNet, they used some 
of the most sophisticated ConvNet architectures as 
the foundational basis for the suggested attention 
mapping network. ChestX-Rayl4, a _ freely 
available chest X-ray dataset, is used to build the 
suggested model. For the classification of the 
cardiomegaly, the better model obtained a F1-score 
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of 0.86, precision of 0.87, AUC value of 0.89, and 
recall of 0.85. 

Zhou et al. [14] utilized deep learning 
techniques (InceptionV3, ResNet-50, and 
Xception) to detect and classify instances of 
cardiomegaly based on images of the X-ray. They 
used the "ChestX-ray8" database as input for the 
techniques that consists of 108,948 X-ray scans 
spanning the period from 1992 to 2015. For this 
study, a total of 21,966 images were chosen. Out of 
the total number of images, specifically 767 
instances were classified as "cardiomegaly," while 
the remaining images were categorized as 
"healthy." This dataset is divided into training that 
contains 20,899 healthy images, 467 cardiomegaly 
images and testing dataset that contains 300 
images. The testing dataset split into two 
categories: "cardiomegaly" and "healthy". The 
division of the training and testing sets was 
conducted in this manner as a result of the 
constrained quantity of images that were annotated 
with the label "cardiomegaly". The InceptionV3 & 
ResNet-50 gave the best accuracy of 0.797 in 
prediction process. 

Bougias et al. [15] used four distinct 
transfer learning algorithms to detect the 
occurrence of cardiomegaly relied on chest X-rays. 
They compared and assessed the algorithms 
diagnostic capabilities by employing _ the 
radiologists' reports as the benchmark for accuracy. 
They employed 2000 chest X-rays that divided into 
1000 were classified as normal, and 1000 
cardiomegaly individuals. The number of deep 
features were retrieved from various networks, 
including SqueezeNet, VGG16, Google's Inception 
V3, and VGG19 are 2048 features. In this study, a 
logistic regression technique was employed, which 
was improved in terms of regularization, to classify 
chest X-rays into two categories: those indicating 
the absence or presence of cardiomegaly. They 
used a logistic regression technique to classify 
chest X-rays into cardiomegaly or not. The authors 
used five metrics to evaluate the techniques 
performance: accuracy, Positive Predictive Value 
(PPV), sensitivity, Negative Predictive Value 
(NPV), and specificity. The VGG19 network gave 
the best performance in terms of accuracy value 
that is 84.5%. 

Chen et al. [16] developed a deep learning 
models to estimate CXR images in the context of 
quick cardiomegaly screening based on two labels: 
cardiomegaly or not, which the model is called 
high-dimensional multiple regression analysis 
(MRA). They conducted the tests relied on the 
chest x-rays dataset gathered from the Clinical 


Centre in USA: National Institutes of Health CXR 
Image Database. Model evaluation was conducted 
using a 10-fold cross-validation approach, and used 
four evaluation metrics: recall, accuracy, F1-score, 
and precision. They have shown that the MRA 
Estimator gave the higher results based on 
accuracy with 86.28%. 


TABLE I. PREVIOUS PAPERS SUMMARIZATION 
REF | YE | ALGORIT RESULTS 
AR | HMS 
U-NET 245 ACCURACY 
20 ARCHITEC | IMAGES 76.5% 
TURE 
WITH A 
VGGI16 
[13] | 20 CARDIO- CHEST Fl- AUC = 0.89 
23 XATTENT | X- SCIRE 
IONNET RAY14 | PRECISI 
ON 
AUC 
RECAL 
L 
[14] | 20 INCEPTIO 21,966 | ACCUR | ACCURACY OF 
19 NV3 IMAGES | ACY INCEPTIONV3 
RESNET- FROM Fl & RESNET-50 
50 CHEST SCORE = 0.797 
XCEPTION | X- 
RAY8 
[15] | 20 GOOGLE's | 2000 AccUR | VGG19 
21 INCEPTIO CHEST ACY ACCURACY = 
NV3 X- SENSITI | 84.5% 
VGG16 RAYS VITY 
VGG19 SPECIFI 
SQUEEZE CITY 
NET PPV 
NPV 
[16] | 20 MULTIPLE | 112, PRECISI | ACCURACY = 
23 REGRESSI 000 ON 86.28% 
ON IMAGES | RECAL 
ANALYSIS | FROM L 
CHEST ACCUR 
X- ACY 
RAY8 Fl 
DATAS SCORE 
ET 
[17] | 20 CXRDAN | CHEST ACCUR | ACCURACY = 
23 ET X- ACY 0.9050 
RAY14 SENSITI 
NLM- VITY 
CXR SPECIFI 
CITY 
Fl 
[18] | 20 TRANSFER | 952 ACCUR | ACCURACY = 
20 LEARNING | CHEST ACY 82% 
X-RAY 
IMAGES 
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[19] | 20 | DENSE cHEstT | AURO | AUROC = DATASET PREPROCESSING 
21 CONVOLU | X-RAY | C 82.9 . ; : 

ae mane Prior to further analysis, the images need to 

NEURAL | ETS undergo a preprocessing stage. This encompasses 

NETWORK several modifications, such as alterations to the 


3. METHOLOGY 


Fig. 1. shows the proposed methodology 
used in this paper to detect and classify the 
chest x-ray into cardiomegaly or health. This 
proposed methodology contains the dataset, 
the preprocessing steps applied, the data 
augmentation techniques, and the deep 
learning algorithms used. 


t —— 
= . | 
ChestX-ray8 Database _ Dataset Dataset, 
(Cardiomegaly Disease) | Preprocessing Augmentation 


Evaluation Deep Learning 
Metrics Models 


Fig. 1. Flow Chart Of Proposed Methodology 


DATASET OVERVIEW 


In this study, we provide a novel database 
called "ChestX-ray8" that contains 108,948 X- 
ray images obtained from 32,717 distinct 
individuals. This dataset contains 8 diseases: 
are pneumothorax, cardiomegaly, atelectasis, 
mass, nodule, pneumonia, infiltration, and 
effusion. We used the chest x-ray for the 
Cardiomegaly disease, and the other diseases 
are labelled as healthy in order to classify the 
images as Cardiomegaly or not. From these 
images, the number of images related to the 
Cardiomegaly is 4,000 chest x-rays images. 

Fig. 2. shows the sample of dataset in 
CSV format that describe each image with 
their features like image index, label (disease 
type), patient ID, width, height, age, and 
position. 


a Follow- Patient Patient Patient View 
Image Index Finding Labels 1, 


1D Age Gender Position 


Consolidation|Effusion 142673 25 M ap 


Fig. 2. Dataset Samples 


Originallmage[Width Height] OriginallmagePixelSpacinglx 


dimensions, alignment, and hue. The objective of 
pre-processing is to enhance the quality of an 
image, hence facilitating more effective analysis. 
Preprocessing techniques enable the removal of 
undesirable distortions and enhancement of key 
attributes that are crucial for the particular 
application under consideration. The 
aforementioned features have the potential to vary 
according on the specific application. Image 
preprocessing is an important step to make the 
dataset suitable for the next process. In this dataset, 
we resizing the images into fixed size with 512 x 
512. 

The dataset is split into validation, training, and 
testing groups. The training dataset indicates to the 
dataset used in building the model, while the 
testing set indicates to the dataset used in 
evaluating the VGG16 model after the model is 
built. Finally, the validation dataset indicates to the 
dataset used in evaluating the model during the 
training process in order to increase the accuracy 
and handle the overfitting issue. The size of 
training dataset is 70% and the testing dataset is 
30% that divided into two sets: 40% of testing 
dataset is validation and the remainder is testing. 


DATASET AUGMENTATION 


DA is a method that used to increase the 
training set by establishing modified replicas of a 
dataset based on existing data. The following 
list comprises several frequently encountered 
examples: 


e The functionality of the system allows for 
the rotation of images at various angles, 
such as 90 and 180 degrees. This feature is 
beneficial in cases where a model is required 
to accurately detect and classify items that 
are positioned at various angles. One often 
employed augmentation technique involves 
s «the application of a rotation of 90 degrees. 


The random crop operation involves the 
cropping of an image at a randomly selected 
place. This phenomenon may lead to the 
occurrence of cropping, wherein an object is 
“partially obscured, hence facilitating the 
model's enhanced ability to recognize 
objects that are not completely visible. 


e The process of exposure involves adjusting 
the brightness levels of an image, either by 
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increasing or decreasing them. This feature dataset contains 8 diseases, which the target size is 
proves advantageous when a model is Cardiomegaly. 

anticipated to be employed in environments Fig. 4. shows the sample of the chest X-rays from 
characterized by varying lighting conditions. the dataset after the data preprocessing and data 
. . augmentation were applied. 

e The blur effect can be applied to an image. = - 


e The function "Flip" allows for the vertical or Cardiomegaly wealthy Healthy Healthy 


4 
horizontal mirroring of an image. It is i 79) Wi 


advisable to refrain from employing this 
e Saturation refers to the alteration of color ceo Cardionenaly Cardiomegaly tly 


augmentation in the event that one is 
intensity within an image. This enhancement ‘d y PY ae 


v 


engaged in the task of text recognition. 
proves to be advantageous in scenarios 
where the lighting conditions inside the 
manufacturing area exhibit variability. 


e The technique of introducing random noise 


involves the application of white and black Fig. 4. Sample Of Chest X-Rays 

pixels throughout an image. This 

phenomenon results in a decrease in visual DEEP LEARNING MODEL 

clarity. We used the CNN (Convolutional neural 


networks) version to accomplish the detection and 
classification task, which is called VGG16. CNNs 
are utilized for clustering images based on their 
similarities, facilitating efficient images search 
capabilities. Moreover, these networks are capable 
of recognizing objects inside complex situations. 

In our experiments, we used the Random = CNN is commonly employed in the identification 
Brightness with [0.7, 1.5], Random Rotations of various visual elements, such as faces, persons, 
with 3, and Random Zoom = 0.125. Fig. 3. street signs, tumours, and platypuses (or platypi) 
presents the part of Python code that are related [19]. 


e The technique of mosaic augmentation 
involves the integration of many images into 
a cohesive whole. Aerial imaging projects 
can greatly benefit from the utilization of 
this particular tool. 


to the resize the image to same size and the The triumph of a deep convolutional 
augmentation technique based on architecture known as AlexNet in the 2012 
ImageDataGenerator. ImageNet competition reverberated globally. CNN 
IMG_SIZE = (512, 512) is playing a pivotal role in generating significant 
core_idg = ImageDataGenerator(samplewise_center=False, progress in the field of computer vision. This 


samplewise_std_normalization-Falsfechnological advancement holds great potential 
horizontal_flip=False, 


vertical_flip=False, for various domains such as autonomous vehicles, 
Des 00k saa eng ge robots, unmanned aerial vehicles, security systems, 
width_shift_range=8.1, . ‘ é zi A 
brightness_range=[0.7, 1.5], Medical diagnostics, and interventions for 
rotation_range=3, individuals with visual impairments [20]. 
shear_range=8.61, 1 : l k id h bili 
Faaulpoicaiieareat Convolutional networks provide the capability 
zoom_range=8.125, to undertake mundane business-related tasks, 


Preprocessing-funetion=preprocess y)RiCh are both financially advantageous and less 
complex. For instance, they can be employed for 
optical character recognition (OCR) purposes, 

Before deep learning model is developed, facilitating the conversion of text into digital 
we generated the training, and validation, and format. This enables the application of natural 
testing based on data augmentation technique language processing techniques to analogue and 
based on target label, target size is based om given handwritten documents, where the images 
image size, color mode is rgb, and the batch size (8 represent symbols that need to be transcribed. CNN 
in training, 400 in testing and 256 in validation), | exhibit versatility outside the domain of image 


The target label must be determined because the recognition. Text analytics has witnessed the direct 
use of these techniques [20]. 


Fig. 3. Data Augmentation Techniques With Parameters 
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These techniques can be employed to analyze 1. Accuracy: The most understandable 

sound when it is visually displayed as a performance statistic is the ratio of 

spectrogram and to process data using graph correctly predicted samples to all 

convolutional networks [20]. The term "VGG16" is samples. 

used to denote the VGG model, which is 

alternatively known as VGGNet. The model in Accuracy = TP+TN 

question is a CNN with 16 layers. The VGG16 TP+TN + FP + FN 


model has demonstrated an accuracy of 92.7% - : 

based on the ImageNet dataset. This dataset 2, Piceisiony. as eases by me 
contains more than 14 million training images peLcenlaee of accurately nae ted 
spanning 1000 distinct object classes. This positive Samples 10 all-actual positive 


: . ays samples whether true or positive. 
particular model holds a prominent position . . 


among the models that participated in the a nL 
ae ; recision = ————— 

ILSVRC-2014 competition. Fig. 5. presents the Le EPP 
part of code for the VGG16 model that contains 
base-pretrained-model that refers to VGG16, and 3. Recall: is the percentage of correctly 
attention model. estimated positive samples to all true 
Fig. 6. shows the VGG16 model summary after the positive and false negative instances. 
model is run. 

Ne ee TP 

Layer (type) Output Shape Param # Recall = TP + FN 

vegi6 (Model) —=—=—=~=~=«(None, 16, 16, 512) 14714688 


4. Fl-score: it is the calculated based on 
attention_model (Model) (None, 1) 138696 ae 
a ca a a ee a the average of precision and recall. 
Total params: 14,853,378 


Trainable params: 137,154 Precision * Recall 
Non-trainable params: 14,716,224 F1— Score =2* 


Precision + Recall 


Fig. 6. VGG16 Architecture. 
We conducted many experiments on this 


Figure 7 shows the part of code that are related dataset with different parameters of the 
to the components of the attention model. These VGGI16 like batch size, number of epochs, and 
components are: three conventional 2D layers with test ratio in prediction process. 
different parameters, Average Pool 2D _ layer In the first experiment, we put the value for 
followed by the last conventional 2D layer. the aforementioned parameters as followings: 


batch size = 4, number of epochs = 50, verbose 
attn_layer = Conv2D(128, kernel_size = (1,1), padding = ‘same’, activation = ‘elu') true and used the validation data Table 2 
A . 


(bn_features) 


attn_layer = Conv2D(32, kernel_size = (1,1), padding = ‘same’, activation = ‘elughows the results of the first experiment. 
(attn_layer) 


attn_layer = Conv2D(16, kernel_size = (1,1), padding = ‘same’, activation = ‘elu’) 

fatinalayer) TABLE II. First Experiments Results 
attn_layer = AvgPool2D((2,2), strides = (1,1), padding = ‘same’ )(attn_layer) 

attn_layer = Conv2D(1, kernel_size = (1,1), 


Accuracy | Precision | Recall 


padding = ‘valid’, 
activation = ‘sigmoid’, score 


name='AttentionMap2D" )(attn_layer) 
VGGI16 | 76 77 76 77 


Figure 7: Attention Model 


In the second experiment, we put the 
4. EXPERIMENTAL RESULTS value for the aforementioned parameters as 
followings: batch size = 32, number of epochs 
= 50, verbose = true, and used the validation 
data. Table 3 shows the results of the second 


We explained the findings that obtained after 
applied the VGG16 model based on following 
evaluation metrics: recall, accuracy, fl-score, and 


ag : experiment. 
precision. The following formulas are used to 
calculate these metrics: FN = False Negative, TN = TABLE III. First Experiments Results 
True Negative, TP = True Positive, and FP = False 
Positive: Model | Accuracy | Precision | Recall | F1- 
score 
VGG16 | 81 83 81 81 
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In the third experiment, we put the value 
for the aforementioned parameters as 
followings: batch size 128, number of 
epochs = 50, verbose = true, and used the 
validation data. Table 4 shows the results of 
the third experiment. 


TABLE IV. Third Experiments Results 


is a collection of medical images related to a 
variety of illnesses, including cardiomegaly. 
Subsequently, we performed the preprocessing 
operations on the dataset, such as resizing and 
normalizing each image. It should use data 
augmentation techniques like random zoom, 
random rotation, and random brightness before 
using the deep learning techniques. The VGG16 is 
the deep learning technique that is being employed. 


Model | Accuracy | Precision | Recall | Fl- 


score 


In the fourth experiment, we put the value 
for the aforementioned parameters as 
followings: batch size 128, number of 
epochs = 50 and verbose = true. Table 5 shows 
the results of the fourth experiment. 


When compared to previous works, the VGG16 
model obtains a high accuracy of 91%, according 
to the data. 
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